In [1]:
from preamble import *
% matplotlib notebook
In [5]:
# read data.
# you can find a description in bank/bank-campaign-desc.txt
data = pd.read_csv("data/bank-campaign.csv")
In [6]:
data.shape
Out[6]:
In [7]:
data.columns
Out[7]:
In [8]:
data.head()
Out[8]:
In [9]:
y = data.target.values
In [10]:
X = data.drop("target", axis=1).values
In [11]:
X.shape
Out[11]:
In [12]:
y.shape
Out[12]:
Data is always a numpy array (or sparse matrix) of shape (n_samples, n_features)
Splitting the data:
In [13]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
In [16]:
# import model
from sklearn.linear_model import LogisticRegression
# instantiate model, set parameters
lr = LogisticRegression()
# fit model
lr.fit(X_train, y_train)
Out[16]:
Make predictions:
In [17]:
lr.predict(X_train)[:10]
Out[17]:
In [18]:
lr.score(X_train, y_train)
Out[18]:
In [19]:
lr.score(X_test, y_test)
Out[19]:
In [ ]:
``model.fit(X_train, [y_train])`` | |
---|---|
``model.predict(X_test)`` | ``model.transform(X_test)`` |
Classification | Preprocessing |
Regression | Dimensionality Reduction |
Clustering | Feature Extraction |
Feature selection |
Model evaluation : score(X, [y])
Uncertainties from Classifiers: decision_function(X)
and predict_proba(X)
In [20]:
# this is short, but we never stored the model
LogisticRegression().fit(X_train, y_train).score(X_test, y_test)
Out[20]:
Load the dataset data/bike_day_raw.csv
, which has the regression target cnt
.
This dataset is hourly bike rentals in the citybike platform. The cnt
column is the number of rentals, which we want to predict from date and weather data.
Split the data into a training and a test set using train_test_split
.
Use the LinearRegression
class to learn a regression model on this data. You can evaluate with the score
method, which provides the $R^2$ or using the mean_squared_error
function from sklearn.metrics
(or write it yourself in numpy).
In [21]:
pd.read_csv("data/bike_day_raw.csv")
Out[21]:
In [22]:
import pandas as pd
pd.__version__
Out[22]:
In [ ]: